## Superscalar:

目的,使得一个cycle 可執行多个指令,equilail ILP

解的 program occution time 方式:

- O. i成少多別指含的 latency
- e. to throughput

Parallel instruction processing requires: the determination of the dependence relationships between instructions, adequate hardware resources to execute multiple operations in parallel, strategies to determine when an operation is ready for execution, and techniques to pass values from one operation to another. When the effects of instructions

P\$ #2: I. - : 2 fetch to decode - 7 instruction stream (multiple instructions)

II. conditional branch prediction 未確保 instruction stream 不啻因 branch jump 中医生

11. 分打 指令周的 dependency 分级指令给 Aunctanal unit 平行執行

The the avaliability of operand data.

T. commit instru. 使其朝行顺序和厚 instruction stream 相同

## 為3 parallel instr. processing 東 implement

- Instruction fetch strategies that simultaneously fetch multiple instructions, often by predicting the outcomes of, and fetching beyond, conditional branch instructions.
- Methods for determining true dependences involving register values, and mechanisms for communicating these values to where they are needed during execution.

   Methods for initiative present the property of the prop
- 3) Methods for initiating, or *issuing*, multiple instructions in parallel.
- 4) Resources for parallel execution of many instructions, including multiple pipelined functional units and memory hierarchies capable of simultaneously servicing multiple memory references.
- 5) Methods for communicating data values through memory via load and store instructions, and memory interfaces that allow for the dynamic and often unpredictable performance behavior of memory hierarchies. These interfaces must be well matched with the instruction execution strategies.
- Methods for committing the process state in correct order; these mechanisms maintain an outward appearance of sequential execution.

## 科 tontrol hazard:

As static program executes with a specific set of input data, the sequence of executed intractions, forms, and the sequence of executed intractions, forms, and the executed services of the executed between the executed. When there is a conditional branch or imput, however, the program counter may be updated to a nonconsecutive address. An instruction is said to be control dependence on its preceding dynamic instruction(s), because the flow of program control must pass through preceding instructions first. The two methods of modifying the program counter—incrementing and updating—result into two types of control dependences (though typically with the people talk about control dependences, they tend to ignore the former).

The first step is increasing instruction level parallelism is to exercise control dependence; Control dependence; Control dependence; Control dependence to micromenting program counter are the simplex, and we dast with me first. One can the simplex and we distribute the me first. One can be simplex and we distribute the first parallel dependence to the control of the control parallel dependence with a simple cut point; III, I the assembly code of Fig. 1, there are three basis blocks. The first basis block coasies of their instructions between the lade 1.2 consists the five instructions between the label 1.3 and the timb basis block consists of the three instructions between the label 1.3 and the timb basis block with the control of the first parallel dependence of instructions in a basis block with the control of the first parallel dependence of instructions in a basis block with the control of the first parallel con

llelism
demons static
static
static
basic
static
basic
static
basic
basic
static
basic
basic
static
basic
ba

-个指令从會被執行到 ... IX banc block 為 window of execution , banc block 内 丁平行 執行,降非有 duto dependency 但要盡了能比exploit parallelism,需利用 Branch prediction / Speculation 來猜則 branch taken or note,結構對平於 複

instr

branch Fig. 3. A conceptual figure of superscalar execution. Processing phases are listed across the top of the figure.

basic block 作營] block 内丛為 sequential executed 且借的水中第一个指令開始動行,則最後

program 中指定勒行如下周:

microarch of superscalar

Fig. 4. Organization of a superscalar processor. Multiple paths connecting units are used to illustrate a typical level of parallelism. For example, four instructions can be fetched in parallel, two integer/address instructions can issue in parallel, two floating point instructions can complete in

phase 1. instr. fetching & branch prediction

PC中有多个指令位tul : instr. cache 要設計成 block size 為數介 instruction size 否则很容易 cach miss, 这成 performance 的损失 又:需要高的 bandwidth (同时fetch多介指分:鲁 split coche

月時, high-bandwidth instruction felch 會利用 prefetch 朱诚少 miss rate

Processing of conditional branch instructions can be broken 再来 branch 指令的 redirection 鲁达成 delay down into the following parts: 1) recognizing that an instruction is a conditional

2) determining the branch outcome (taken or not taken),

3) computing the branch target, and 4) transferring control by redirecting instruction fetch (in the case of a taken branch).

111. 可以 违险 predecode logic,作為輔助判別是否為 branch instruction 2 information 存在 instr. cache 4 predecade branch instruction opcode: op 0~op6 - ! 甚至 identify 生 intru.

12). 凿 hranch 指令felch 畔, 可能 data, operand 尚丰华端好(如其它instr. 有 dependency) .. 利用 predictor \$ predict O. Static branch prediction: 41用complier profiling 的 information, 含为static binary <sup>●.</sup>dynamic brounch prediction:在run-time 利用失前勃行经常来 predict 思HW支接; Dinnoh prediction table 131. 計算跳躍目的位址-般需 adder To : ISA 定義之 branch addrewing mode - 般為 PC + offset : 不需 read register 丁再制用 branch target buffer to 建 et step.

phase I. Instruction Decoding, Renaming, and Disportch

推 instruction buffer f3 19 # 16章, # detect 指意图的data & control dependency

包含: e. true data hazard detection (RAW)

O. anti dependency & output | WAR, WAW) dispatch 指令至分functional unit 的 buffer